An Analysis of NFL Offensive Stats

Data Science 1 with R (STAT 301-1)

Author

Sydney Newton

Published

December 6, 2023

Github Repo Link

Introduction

Since I was a little kid, I have been a huge fan of the Kansas City Chiefs football team. I’ve watched the team go through several different players, and have many up and down seasons. I have also followed the nfl for several years, so I chose an nfl centered dataset.

When looking at the player and team datasets, my main goal is analyzing nfl stats at both a player level. I have seen many different types of players, so I am interested what the stats looks like of each position, and how the stats have changed over the last 5 years. I am interested in analyzing how player stats have changed over the last few years, and how stats contribute to a team’s performance.

Data Overview & Quality

I am using a data set from kaggle.com that shown the offensive stats for every offensive player in every game in the NFL from 2019-2022. The data set has 69 columns that cover a variety of stats for different positions, and 19,973 observations. Each observation correlates with one players stats during one NFL game within those 4 years. There are both categorical and numerical variables, although the data set has several more numerical variables.

There was missingness issues for only one variable, which was “Vegas Favorite”. This is the variable that says if the player was considered a favorite in the stats for betting, so it’s not entirely surprising that some of the variables are missing. I don’t believe the issues in this one variable will impact the analysis.

There was a little bit of cleaning that had to go into this data set. There were a few defensive players included in the datatset, so I removed those. I also removed a few columns that were not necessary to my data exploration, such as “player_id”, “team_abbr”, and a few stats that did not relate to offensive player performance. Lastly, I also mutated the game id column so it just showed the year of the game, rather than the long number id.

Explorations

Recieving Analysis

First, I want to look at receiving.

Frequency of Recieving Yards

To begin, I am curious about what the distribution looks like of receiving yards. There are two positions that receive: wide receivers and tight ends. I want to look at the distribution for the overall stats.

Figure 1: The Distribution of Overall Receiver Yards per Game for Both Tight Ends and Wide Receivers

Figure 1 shows that majority of recieving yards lie in the 0-50 range. However, the graph has a right skew, and there are outliers which are players with above 150+ yards in a game.

Next, I want to look at the distribution for overalls stats by position.

Figure 2: The Distribution of Overall Receiver Yards per Game for Tight Ends vs Wide Receivers

Figure 2 reveals that Tight Ends have lower overall receiving stats than wide receivers. This is not surprising, since tight ends have two jobs: blocking and receiving. Not all tight ends catch, but all wide receivers catch. This difference in position nature explains the difference in the range of receiving yards between the two positions.

Wide Reciever Receiving Yards Across Years

Next, I want to explore how the receiving yards stats have changed across years. Since tight ends and wide receivers are fairly different positions, I am going to explore the position separately. First, I am going to look at wide receivers.

year mean_rec_yds
2019 33.55330
2020 34.26984
2021 31.89035
2022 32.30046

The receiving yards by year of wide receivers shows a unique trend. The average yards went up in 2020, but then when down in 2021 and 2022. However, the average has not changed significantly, only going down a total of an average of 2 yards. This shows that the emphasis on throwing in the NFL has remained fairly consistent, and that the talent in the NFL has also stayed consistent.

Figure 3: The Distribution of Receiving Yards in 2019, 2020, 2021, and 2022 for Wide Receivers

The range in Figure 3 year by year is also fairly consistent. There seem to be many outliers each year, and there is one extreme outliers in 2020 and 2022.

Top Wide Receivers

Now, I am interested in looking at which players these 2 outliers are for.

player position year team rec_yds
Tyreek Hill WR 2020 KAN 269
Tyler Lockett WR 2020 SEA 200
player position year team rec_yds
Ja'Marr Chase WR 2022 CIN 266
Gabriel Davis WR 2022 BUF 201

After looking into it more, I found that the outlier in 2020 was Tyreek Hill and the outlier in 2022 was Ja’Marr Chase.

After exploring further, I learned that Tyreek Hill got 269 receiving yards while he played for the Kansas City Chiefs in a game against the Tampa Bay Bucaneers. I personally remember watching this game, and I recalled it as soon as I saw the outlier, which is why I wanted to explore further. Ja’Marr Chase got 266 receiving yards for the Cinncinati Bengals in a game against the Jacksonville Jaguars.

Both of these games set records, with Hill holding the 14th most recieving yards in a single game of all time and Chase holding the 16th most. Taking these stats into consideration, it makes sense that the outliers look so significant in the distribution.

Tight End Receiving Yards Across Years

Now that I have explored the recieving yards of wide receivers from 2019 to 2022, I am interested in exploring the change in receiving yards of tight ends across years.

year mean_rec_yds
2019 18.80272
2020 17.57289
2021 17.35009
2022 17.09881

The average receiving stats of tight ends each year is more consistent than wide receivers, remaining around 17 each year, although it was higher in 2019.

Figure 4: The Distribution of Receiving Yards by for 2019, 2020, 2021, and 2022 for Tight Ends

When looking at Figure 4, there are similar ranges across years. There also seems to be several outliers each year. However, 2020 and 2021 have higher outliers than either of the other years.

Top Tight Ends

Similar to Wider Receivers, I am curious to see which tight ends the outliers are.

player position year team rec_yds
Darren Waller TE 2020 LVR 200
George Kittle TE 2020 SFO 183
Travis Kelce TE 2020 KAN 159
Darren Waller TE 2020 LVR 150
Travis Kelce TE 2020 KAN 136
player position year team rec_yds
Travis Kelce TE 2021 KAN 191
George Kittle TE 2021 SFO 181
Kyle Pitts TE 2021 ATL 163
George Kittle TE 2021 SFO 151
David Njoku TE 2021 CLE 149

Unlike Wide Receivers, there isn’t any extreme outliers, but there are several that are significantly above average. The average receiving yards per tight end is 17-18, yet there are receivers with above 150 yards. When looking at the top 5 for the two outlier years, there are several players with high numbers, such as Travis Kelce and George Kittle.

player avg_rec_yds
Travis Kelce 83.50909
Darren Waller 70.04545
George Kittle 66.82927
Mark Andrews 61.67347
Kyle Pitts 60.35294

When looking at the top tight ends with the best overall receiving yards, the data is consistent with the outliers in 2020 and 2021. Travis Kelce, Darren Waller, George Kittle, and Kyle Pitts have the highest average receiving yards across all games, so it makes sense those 4 make up many of the outliers.

It also makes sense that tight ends have a wider range of receiving yards that wide receivers since the position has two functions, as mentioned earlier. Since not all tight ends receive, it creates a high disparity between the average receiving yards among all tight ends and the average recieving yards among the top tight ends.

Tight Ends vs. Wide Receivers

Now that we have explored the receiving yards across years among tight ends and wide receivers individually, I am interested in comparing them during each year.

Figure 5: A side-by-side comparison of the total receiving yards for Tight Ends versus Wide Receivers for each year in the data

Figure 6: A side-by-side comparison of the average receiving yards individual Tight Ends versus Wide Receivers across 4 years

Figure 5 and Figure 6 make it clear that wide receivers have a much higher amount of receiving yards than tight ends, both total and average. Neither of these graphs is surprising.

First, there are 3 wide recievers on a team, compared to 1 tight end on a team. This explains why the total receiving yards is higher. Second, since the positions functions differently, it is understandable why the average receiving yards is higher for wide receivers.

Recieving Yards vs Other Recieving Variables

When looking at the variables, noe of them seem to have an extremely high correlation with one another. The only variables that look to have a somewhat high relation to the other is receiving yards and receiving long.

Recieving Yards vs Other Recieving Variables

nfl_rec <- nfloffensiveplayers_new |>
   filter(position == "WR" | position == "TE") |>
  select(rec_yds, rec_long) |>
  cor()
kable(nfl_rec)
rec_yds rec_long
rec_yds 1.0000000 0.8473774
rec_long 0.8473774 1.0000000

There was an 84% correlation between receiving yards and receiving long. This correlation makes sense, since the “receiving long” variable means that the receiving was running a long route when they caught the call. The longer the route, the more yards they in turn get, causing more receiving yards.

Passing Analysis

Next, I want to examine passing.

Passing Yards Across Years

First, I am going to analyze the passing yards across years.

year mean_pass_yds
2019 212.3322
2020 204.7006
2021 199.5871
2022 202.2571

The average passing yards in the nfl is interesting. You have to pass a ball to recieve it, so I am surprised that there is a higher range of differences year by year in passing than in recieving. There was a downward trend in average recieving yards from 2019 to 2021, but then it went back up. I think the disaparity between throwing and recieving could be explained by the fact that there is only one quarterback on an nfl team, compared to multiple recievers.

The range of passing yards is pretty consistent, similar to the averages, althought there is a small downward trend. There are almost now outliers, except for 1 extreme outlier in 2019 and 2 extreme outliers in 2021.

Top Quarterbacks

Now, I am interested in exploring what the who the 3 quarterback outliers are.

player position year team pass_yds
Jared Goff QB 2019 LAR 517
Dak Prescott QB 2019 DAL 463
Matt Schaub QB 2019 ATL 460
Jameis Winston QB 2019 TAM 458
Jameis Winston QB 2019 TAM 456
player position year team pass_yds
Joe Burrow QB 2021 CIN 525
Ben Roethlisberger QB 2021 PIT 501
Dak Prescott QB 2021 DAL 445
Lamar Jackson QB 2021 BAL 442
Derek Carr QB 2021 LVR 435

The outlier in 2019 was Jared Goff when he played for the Los Angeles Rams. The outliers in 2021 were Joe Burrow when he played for the Cinncinati Bengals and Ben Roethlisberger when he played for the Pittsburg Steelers. These are outliers since they are all 50+ yards above the person below them.

Similar to the Wide Reciever stats, these games all set records. Burrows game was the 4th most passing yards of all time in an nfl game, Roethlisberger’s was the 5th most, and Goff’s was the 10th most. These records help explain why the outliers seem so significant.

Correlation Between Passing Yards and Other Passing Variables

Now that we have analyzed passing yards, I am interested in analyzing the relationship between passing yards and the other variables related to passing: pass completions, pass attempts, pass interceptions, passes sacked, and pass rating.

The heatmap shows that the three variables with the highest correlations to each other are pass completions, pass yards, and pass attempts. I am interested in further exploring relationship with those 3 variables.

Pass Completions vs. Pass Yards vs. Pass Attempts

First, I want to further explore the correlation between passing yards, pass completions, and pass attempts for quarterbacks.

I created a scatterplot matrix to explore the relationship between all 3 at the same time. The variables all have high correlations with each other and demonstrates a positive correlation in the graphs.

Althought they are all similar, pass completions and pass attempts have a slightly higher correlation than the other variables. Pass yards and pass completions have the second highest correlation, and pass yards and pass attempts have the lowest correlation.

Rushing Analysis

Finally, I am going to explore rushing in this dataset.

Distribution of Rushing Yards

There are three different positions that can rush the ball: running backs, and full backs. To start, I want to examine the distribution or rushing yards of both positions.

The distribution of rushing yards per game and rushing yards per player have similar overall distributions. They are both right skew and unimodel. However, the frequency of the average rushing yards is lower than total yards, althought that is to be expected.

I am personally not surprised by the right skew. Rushing is generally not meant to gain a lot of yards, so it makes send that both the majority of rushing yards per game per and the average rushing yards per player is in the 0-25 range. However, some plays end up gaining a large number of rushing yards, which explains the right skew and the outliers on both graphs. There is one outlier in average yards, which will be explored below in the “Top Running Backs” section.p;

Distribution of Rushing Yards by Position

Next, I want to compare the distribution of rushing yards among the 2 positions.

These distributions produce consistent results. In both graphs, running backs have higher rushing yards than fullbacks. At first, these results surprised me since running backs and full backs are virtually the same position. However, after analyzing it more and looking at the dataset, I realized that teams have siginificantly more running backs than fullbacks and they use running backs more. This explains the difference in both total yards and average yards.

Analysis of Rushing Yards by Year

Now that we have analyzed the distribution of rushing yards, I want to examine how rushing yards have changed over the last 5 years. Since running backs and fullbacks are extremely similar positions, I am going to summarize both positions for this analysis.

year mean_rush_yds
2019 29.41245
2020 29.64167
2021 29.61589
2022 29.08834

There is a extremely high consistency when it comes to rushing yards in the nfl. It remained around 33.7 average yards across the 4 years, never differentiating by more than a yard. The average yards are also lower than recieving and passing, which makes sense since rushing generally does not lead to as many yard gains as recieving does.

There is a similar range among all four years. However, there are many outliers in each year. 2021 has 2 outliers that are higher than any other year.

Top Running Backs

Similar to the other positions, I am interested in seeing which running backs performed high enough to be the outliers.

player position year team rush_yds
Jonathan Taylor RB 2021 IND 253
Derrick Henry RB 2021 TEN 250
Dalvin Cook RB 2021 MIN 205
Jonathan Taylor RB 2021 IND 185
Derrick Henry RB 2021 TEN 182

Jonathan Taylor had 253 yards in a single game while playing for the Indianapolis Colts, and Derrick Henry had 250 yards while playing for the Tennesee Titans. This is over 200 above the average and over 50 above the other outliers, which explains why it stands out in the graph.

Unsurpringsly, these games also set records in the nfl. Taylor had the 9th most rushign yards in an individual game in nfl history, and Henry had the 13th. It is interesting to compare the stand-out outliers to nfl records, since it puts into perspective how extreme the outliers truly are.

player avg_rush_yds
Derrick Henry 114.88636
Jonathan Taylor 92.66667

These result are consistent with the outliers, since Jonathon Taylor and Derrick Henry not only had the two highest performing games, but were also the overall top 2 highest performing running backs in the past 5 years.

Rush Yards vs. Other Rush Variables

Unlike the passing variables, the rushing variables don’t have as heavy of a correlation with each other. The only variables that seem to have somewhat of a heavy correlation is rushing yards and rushing attempts, and then rushing yards and rushing yards before contact.

Rush Attempts vs. Rush Yards

Now, I want to further explore the correlation between rushing yards and rush attempts for running backs.

rush_att rush_yds
rush_att 1.0000000 0.8748997
rush_yds 0.8748997 1.0000000

There is a 87.5% correlation between rushing yards and rush attemps. This is consistent with the correlation shown by the heat map. It also makes sense that these two variables are related. The more attempts someone has at rushing, the more yards they will get. That said, it also makes sense that it is not extremely close to 100%, since an attempt doesn’t guarantee that the running back will gain yards on the play.

Rush Yards vs. Rush Yards Before Contact

Next, I want to explore the relationship between rush yards and rush yards before contact.

rush_yds rush_yds_before_contact
rush_yds 1.0000000 0.9006858
rush_yds_before_contact 0.9006858 1.0000000

There is also a heavy correlation between these two variables, at 90%. However, I am surrpised the correlation is not even higher. One of the variables represents the total rush yards in a game, and the other variable represents the total rush yards in a game before the player is hit. Normally, when the player gets hit, it means the play is over. Because of this, I am surprised the correlation is not closer to 100%.

Overall Analysis

Now that we have explored rushing, passing, and recieving more deeply, I am interested in analyzing the dataset as a whole.

Frequency of Positions

First, I am interested in looking at how common each position is in the dataset.

Wide Recievers are the most common offensive position in the dataset, which makes sense since there are more wide recievers on the feild at a time than the other positions. Quarterback was the least common positions, which is also not surprising since teams normally only have 1 backup quarterback, and their main “franchise” quarterback remains on the field at all times. This is unlike the other positions where the players consistently switch out, so they need more backups.

I was surprised that there was more tight ends in the dataset than running backs. I see more running backs switch out than tight ends when I’m watching football, so I always assumed there would be more backup RBs than TEs. However, this assumption is untrue as shown by the pie chart.

In this pie chart, there are more tackles than centers, which makes sense since there are more tackles on the field than centers at a time. It is surprising that there are more tackles than guards since both positions have 2 on the field at a time. I would assume it’s because tackles get more easily injured since they are on the ends of the offensive line.

Frequency of Positions by Year

All of the positions remained fairly consistent throughout each year, except for the decrease in 2022. This disparity can be explained by the fact that this dataset was created in 2022, so they likely missed players that were recruited towards the end of the year.

Conclusion

Throught this exploration, I looked at rushing, receiving, and passing. I found that the average years of each has remained consistent across the last four years, although passing yards has varied the most. It was surprisinging that passing yards varied more than receiving yards, since the two should in theory have a high correlation with one another. Each of the variables also had outliers. When these outliers were explored further, it was found that they were correlated to individual players that set NFL records with those numbers. Two of the variables, receiving and rushing, centered around two different position. In both cases, one position out performed the other, althought that was for different reasons. Wide recievers had higher receiving stats than tight ends since not all tight ends receive, and running backs had higher rushing stats than full backs since running backs are used more. I had expected the different in receiving yards since I follow the tight end position, but I did not expect the difference in rushing yards.

When looking at the three categories, there were high correlations between variables in each one. Receiving yards was heavily correlated with receiving long, and rushing yards was with rushing yards before contact and rush attempts. Additionally, passing yards, pass attempts, and pass completions all had heavy correlations with each other. These correlations all made sense logically, and I expected most of them.

In the future, it would be interesting to take the players stats analysis and see how it impacted each of the games they played it. You could use this comparison to identiy trends between certain rushing, passing, or recieving stats and whether or not the team won, which could be very useful for offensive coaches.

References

Fernandez, Daniel. “NFL Offensive Stats 2019 - 2022.” Kaggle, 23 Aug. 2022, www.kaggle.com/datasets/dtrade84/nfl-offensive-stats-2019-2022.

“NFL Passing Yards Single Game Leaders.” Pro Football Reference, Sports Reference, www.pro-football-reference.com/leaders/pass_yds_single_game.htm. Accessed 6 Dec. 2023.

“NFL Receiving Yards Single Game Leaders.” Pro Football Reference, Sports Reference, www.pro-football-reference.com/leaders/rec_yds_single_game.htm. Accessed 6 Dec. 2023.

“NFL Rushing Yards Single Game Leaders.” Pro Football Reference, Sports Reference, www.pro-football-reference.com/leaders/rush_yds_single_game.htm. Accessed 6 Dec. 2023.